A Component-based Methodology to Adapt the Fault Tolerance of Distributed Protocols
نویسندگان
چکیده
Nowadays, there are many protocols able to cope with process crashes, but, unfortunately, a process crash represents only a particular faulty behavior. Handling tougher failures (e.g. sending omission failures, receive omission failures, arbitrary failures) is a real practical challenge due to malicious attacks or unexpected software errors. This paper proposes a component-based methodology allowing, for example, to take a protocol A resilient to crash failures and to add software components in order to adapt the protocol A to be resilient to more general failures than crash. On this basis, it introduces the notions of liveness failure detector and safety failure detector, two independent software components to be used by a protocol to increases its resilience respectively to liveness and safety failures of processes running the protocol. Then, the feasibility of this approach is shown, by providing an implementation of liveness failure detectors and of safety failure detectors for a protocol solving the problem of global data computation.
منابع مشابه
Comparison of Failure Detectors and Group Membership: Performance Study of Two Atomic Broadcast Algorithms
Protocols that solve agreement problems are essential building blocks for fault tolerant distributed systems. While many protocols have been published, little has been done to analyze their performance, especially the performance of their fault tolerance mechanisms. In this paper, we present a performance evaluation methodology that can be generalized to analyze many kinds of fault-tolerant alg...
متن کاملImproving the palbimm scheduling algorithm for fault tolerance in cloud computing
Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...
متن کاملOutlier Detection in Wireless Sensor Networks Using Distributed Principal Component Analysis
Detecting anomalies is an important challenge for intrusion detection and fault diagnosis in wireless sensor networks (WSNs). To address the problem of outlier detection in wireless sensor networks, in this paper we present a PCA-based centralized approach and a DPCA-based distributed energy-efficient approach for detecting outliers in sensed data in a WSN. The outliers in sensed data can be ca...
متن کاملA generalized ABFT technique using a fault tolerant neural network
In this paper we first show that standard BP algorithm cannot yeild to a uniform information distribution over the neural network architecture. A measure of sensitivity is defined to evaluate fault tolerance of neural network and then we show that the sensitivity of a link is closely related to the amount of information passes through it. Based on this assumption, we prove that the distribu...
متن کاملThe Delta-4 approach to dependability in open distributed computing systems
1.1 Fault-tolerance As part of the European Strategic Programme for Research in Information Technology (ESPRIT), the Delta-4 project is seeking to define an open, faulttolerant, distributed computing architecture. This paper presents the overall Delta-4 framework for open, fault-tolerant, distributed computing systems and sketches the current implementation which is based on a local area networ...
متن کامل